SEM - Path Analysis

Structural equation modeling (SEM) is a collection of statistical techniques that allow a set of relationships between one or more IVs, either continuous or discrete, and one or more DVs,either continuous or discrete, to be examined.

Since the SEM is about exploring relationships among one or more dependent and independent variables , it is important to explore to understand the effect multiple variables on variance and covrariance.

Some Basic Diagrams

Completeness of Path Diagram

Other Assumptions

Following two diagrammes represents practical pehnomennons using path analysis -

Several conventions are used in developing SEM diagrams. Measured variables, also called observed variables, indicators, or manifest variables, are represented by squares or rectangles.

Factors have two or more indicators and are also called latent variables, constructs, or unobserved variables. Factors are represented by circles or ovals in path diagrams.

Independent Variable : In SEM, if there is not arrow point toward a variable can be considered as independent variable.

Dependnet Variables : In SEM, all the variables which has an arrow pointing towards it can be considered as dependent variable.

Moderation, Mediation and Path Analysis

In this section we will brush-up the following relationship -

  1. Moderation

  2. Mediation

Moderation & Mediation

In the following section we will brush up our understanding about moderationa and mediation.

library(ggplot2)
library(plotly)
## Warning: package 'plotly' was built under R version 4.2.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
agg <- read.csv("E:/Mine/Books/IT/R-programming/Outline - R Programming/WD/D3.1/aggression.csv") # old aggression data that we discussed earlier 
str(agg)
## 'data.frame':    442 obs. of  5 variables:
##  $ X       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ ID      : int  69 55 7 96 130 124 72 139 102 179 ...
##  $ Aggress : int  13 38 30 23 25 46 41 22 35 23 ...
##  $ Vid_Game: int  16 12 32 10 11 29 23 15 20 20 ...
##  $ CaUnTs  : int  0 0 0 1 1 1 2 3 3 3 ...
plot_ly(agg, x = ~Vid_Game, y = ~Aggress, color = ~CaUnTs, size = ~CaUnTs, type = "scatter")
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: `line.width` does not currently support multiple values.
ggplot(data = agg, aes(x = Vid_Game, y = Aggress)) + geom_point(aes(color = CaUnTs)) + facet_grid(cut(agg$CaUnTs,4)) + geom_smooth(method = "lm", color = "red")
## `geom_smooth()` using formula = 'y ~ x'

model1 <- lm(agg$Aggress ~ agg$Vid_Game + agg$CaUnTs)

summary(model1)
## 
## Call:
## lm(formula = agg$Aggress ~ agg$Vid_Game + agg$CaUnTs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.952  -6.696  -0.168   7.022  32.499 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  21.76433    1.80731  12.042  < 2e-16 ***
## agg$Vid_Game  0.18769    0.06940   2.705  0.00711 ** 
## agg$CaUnTs    0.76312    0.05024  15.191  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 439 degrees of freedom
## Multiple R-squared:  0.3559, Adjusted R-squared:  0.353 
## F-statistic: 121.3 on 2 and 439 DF,  p-value: < 2.2e-16
model1.1 <- lm(agg$Aggress ~ agg$Vid_Game + agg$CaUnTs + agg$Vid_Game * agg$CaUnTs)

summary(model1.1)
## 
## Call:
## lm(formula = agg$Aggress ~ agg$Vid_Game + agg$CaUnTs + agg$Vid_Game * 
##     agg$CaUnTs)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.7144  -6.9087  -0.1923   6.9036  29.2290 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             33.120233   3.427254   9.664  < 2e-16 ***
## agg$Vid_Game            -0.333597   0.150826  -2.212 0.027495 *  
## agg$CaUnTs               0.168949   0.161049   1.049 0.294731    
## agg$Vid_Game:agg$CaUnTs  0.027062   0.006981   3.877 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.976 on 438 degrees of freedom
## Multiple R-squared:  0.3773, Adjusted R-squared:  0.373 
## F-statistic: 88.46 on 3 and 438 DF,  p-value: < 2.2e-16

Point to be remembered

  • One need to remember to center the variables.
  • Moderation occurs when the relationship between two variables changes as a function of a third variable. THis shows that regression coefficient between two variables is a function of a third variable.

Mediation

Baron and Kenny suggested that mediation is tested through three linear models:

  1. A linear model predicting the outcome from the predictor variable.
  2. A linear model predicting the mediator from the predictor variable.
  3. A linear model predicting the outcome from both the predictor variable and the mediator.

If the predictor variable must predict the outcome variable less strongly in model 3 than in model 1, the it can be concluded that mediation exist.

Significance of Mediation

Sobel test is performed to assess the significance of the indirect effect. Following statistics can be used to test the significance.

\[indirect-effect(standardized) = \frac{a * b}{\sigma_{dv}}* \sigma_{iv}\]

where,

a and b are the regression coefficients of the indirect path.Further, it can also be represented as follows -

\[P_{m} = \frac{a * b}{c}\]

\[P_{m} = \frac{a * b}{c^\prime}\]

library(ggplot2)
library(plotly)
library(GGally)
## Warning: package 'GGally' was built under R version 4.2.3
stress <- read.csv("E:/Mine/Books/IT/R-programming/Outline - R Programming/WD/D3.1/e_stress.csv")

str(stress)
## 'data.frame':    262 obs. of  7 variables:
##  $ tenure  : num  1.67 0.58 0.58 2 5 9 0 2.5 0.5 0.58 ...
##  $ estress : num  6 5 5.5 3 4.5 6 5.5 3 5.5 6 ...
##  $ affect  : num  2.6 1 2.4 1.16 1 1.5 1 1.16 1.33 3 ...
##  $ withdraw: num  3 1 3.66 4.66 4.33 3 1 1 2 4 ...
##  $ sex     : int  1 0 1 1 1 1 0 0 1 1 ...
##  $ age     : int  51 45 42 50 48 48 51 47 40 43 ...
##  $ ese     : num  5.33 6.05 5.26 4.35 4.86 5.05 3.66 6.13 5.26 4 ...
ggpairs(stress)

Description of Economic Stress dataset :

  • estress - economic stress , the main phenomena
  • ese - economic and social ties (business networking, i.e. no of people respondent physically met on a + talked to over phone and sent an email on Everday day.
  • affect - depression due to economic stress
  • withdraw - closing business
  • tenure - experience in the business
  • Rest of the variables are demographic variables and self explanatory in nature.

So, the strings of relationships starts with economic stress and ends with withdrawl symptoms.

model2 <- lm(withdraw ~ estress, data = stress)
summary(model2)
## 
## Call:
## lm(formula = withdraw ~ estress, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4547 -1.2302 -0.2022  0.7978  4.8820 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.06187    0.26202   7.869 9.64e-14 ***
## estress      0.05612    0.05421   1.035    0.302    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.247 on 260 degrees of freedom
## Multiple R-squared:  0.004105,   Adjusted R-squared:  0.0002748 
## F-statistic: 1.072 on 1 and 260 DF,  p-value: 0.3015
model2.1 <- lm(withdraw ~ estress + tenure, data = stress)
summary(model2.1)
## 
## Call:
## lm(formula = withdraw ~ estress + tenure, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4925 -1.2293 -0.2005  0.7921  4.8641 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.095708   0.267552   7.833 1.23e-13 ***
## estress      0.058489   0.054393   1.075    0.283    
## tenure      -0.007556   0.011760  -0.643    0.521    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.248 on 259 degrees of freedom
## Multiple R-squared:  0.00569,    Adjusted R-squared:  -0.001988 
## F-statistic: 0.7411 on 2 and 259 DF,  p-value: 0.4776
model2.2 <- lm(withdraw ~ estress + tenure + affect, data = stress)
summary(model2.2)
## 
## Call:
## lm(formula = withdraw ~ estress + tenure + affect, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1717 -0.9475 -0.2249  0.8487  2.9045 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.447e+00  2.586e-01   5.596 5.58e-08 ***
## estress     -7.682e-02  5.274e-02  -1.457    0.146    
## tenure      -5.826e-05  1.075e-02  -0.005    0.996    
## affect       7.691e-01  1.037e-01   7.415 1.75e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.135 on 258 degrees of freedom
## Multiple R-squared:  0.1804, Adjusted R-squared:  0.1708 
## F-statistic: 18.92 on 3 and 258 DF,  p-value: 4.003e-11
summary(lm(affect ~ estress , data = stress))
## 
## Call:
## lm(formula = affect ~ estress, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0095 -0.4195 -0.1609  0.2498  4.0278 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.79936    0.14331   5.578 6.11e-08 ***
## estress      0.17288    0.02965   5.831 1.63e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6819 on 260 degrees of freedom
## Multiple R-squared:  0.1156, Adjusted R-squared:  0.1122 
## F-statistic:    34 on 1 and 260 DF,  p-value: 1.63e-08
summary((lm(withdraw ~ affect , data = stress)))
## 
## Call:
## lm(formula = withdraw ~ affect, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1028 -0.8919 -0.2092  0.8713  2.8713 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.17416    0.17035   6.893 4.13e-11 ***
## affect       0.71772    0.09713   7.389 2.02e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.136 on 260 degrees of freedom
## Multiple R-squared:  0.1735, Adjusted R-squared:  0.1704 
## F-statistic:  54.6 on 1 and 260 DF,  p-value: 2.02e-12
model2.3 <- lm(withdraw ~ estress + tenure + affect + ese, data = stress)
summary(model2.3)
## 
## Call:
## lm(formula = withdraw ~ estress + tenure + affect + ese, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1273 -0.9006 -0.2397  0.8421  3.1475 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.776801   0.548843   5.059 8.02e-07 ***
## estress     -0.087544   0.052235  -1.676  0.09496 .  
## tenure      -0.002147   0.010641  -0.202  0.84029    
## affect       0.708135   0.104835   6.755 9.51e-11 ***
## ese         -0.208677   0.076252  -2.737  0.00664 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.121 on 257 degrees of freedom
## Multiple R-squared:  0.2036, Adjusted R-squared:  0.1912 
## F-statistic: 16.42 on 4 and 257 DF,  p-value: 5.394e-12
model2.4 <- lm(withdraw ~ estress + tenure + affect + ese + age, data = stress)
summary(model2.4)
## 
## Call:
## lm(formula = withdraw ~ estress + tenure + affect + ese + age, 
##     data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1086 -0.9079 -0.2310  0.8442  3.1622 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.9599667  0.6320431   4.683 4.59e-06 ***
## estress     -0.0860641  0.0523622  -1.644  0.10148    
## tenure      -0.0004927  0.0110210  -0.045  0.96437    
## affect       0.7061064  0.1050256   6.723 1.15e-10 ***
## ese         -0.2117236  0.0765256  -2.767  0.00608 ** 
## age         -0.0040984  0.0069859  -0.587  0.55795    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.123 on 256 degrees of freedom
## Multiple R-squared:  0.2046, Adjusted R-squared:  0.1891 
## F-statistic: 13.17 on 5 and 256 DF,  p-value: 2.005e-11
model2.5 <- lm(withdraw ~ estress + tenure + affect + ese + age + sex, data = stress)
summary(model2.5)
## 
## Call:
## lm(formula = withdraw ~ estress + tenure + affect + ese + age + 
##     sex, data = stress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0123 -0.8897 -0.2264  0.8324  3.2615 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.9522391  0.6322497   4.669 4.89e-06 ***
## estress     -0.0922462  0.0527896  -1.747  0.08177 .  
## tenure      -0.0001811  0.0110287  -0.016  0.98691    
## affect       0.7047681  0.1050607   6.708 1.26e-10 ***
## ese         -0.2157879  0.0766671  -2.815  0.00526 ** 
## age         -0.0046568  0.0070130  -0.664  0.50728    
## sex          0.1355702  0.1448240   0.936  0.35011    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.123 on 255 degrees of freedom
## Multiple R-squared:  0.2074, Adjusted R-squared:  0.1887 
## F-statistic: 11.12 on 6 and 255 DF,  p-value: 5.151e-11

In the context of the above regression results which yielded a non-statisfactory result - we would like to obtain more insight about the roles playes by various variables using the following visualisations.

library(ggplot2)

colnames(stress)
## [1] "tenure"   "estress"  "affect"   "withdraw" "sex"      "age"      "ese"
ggplot(data = stress, aes(x = estress,y = withdraw))+ geom_smooth(method = "lm", color = "red") + facet_wrap(cut(stress$affect,4))

ggplot(data = stress, aes(x = estress,y = withdraw))+ geom_smooth(method = "lm", color = "red") + facet_wrap(cut(stress$ese,4))

ggplot(data = stress, aes(x = estress,y = withdraw))+ geom_smooth(method = "lm", color = "red") + facet_wrap(cut(stress$tenure,4))

ggplot(data = stress, aes(x = estress,y = withdraw))+ geom_smooth(method = "lm", color = "red") + facet_wrap(cut(stress$age,4))

ggplot(data = stress, aes(x = estress,y = withdraw))+ geom_smooth(method = "lm", color = "red") + facet_wrap(stress$sex)

library(psych)

# A Basic / simple version #

model2.6 <- mediate(withdraw ~ estress + affect, data = stress) 

summary(model2.6)
## Call: mediate(y = withdraw ~ estress + affect, data = stress)
## 
## No mediator specified leads to traditional regression 
##           withdraw   se     t  df     Prob
## Intercept     1.45 0.25  5.74 259 2.61e-08
## estress      -0.08 0.05 -1.47 259 1.44e-01
## affect        0.77 0.10  7.46 259 1.29e-12
## 
## R = 0.42 R2 = 0.18   F = 28.49 on 2 and 259 DF   p-value:  6.53e-12

Now you need to develope alternative models , which you think represents the phenomena. Dataset has already been shared with you.